Comparing Medline citations using modified N-grams: Table 1
نویسندگان
چکیده
منابع مشابه
Comparing Medline citations using modified N-grams
OBJECTIVE We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information. MATERIALS AND METHODS Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) d...
متن کاملProtein classification using modified n-grams and skip-grams.
Motivation Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce...
متن کاملHash Table Sizes for Storing N-Grams for Text Processing
N-grams have been widely investigated for a number of text processing tasks. However n-gram based systems often labor under the large memory requirements of naïve storage of the large vectors that describe the many n-grams that could potentially appear in documents. This problem becomes more severe as the number of documents (and hence the number of vectors to store and process) rises. A natura...
متن کاملUsing Argumentation to Retrieve Articles with Similar Citations from MEDLINE
The aim of this study is to investigate the relationships between citations and the scientific argumentation found in the abstract. We extracted citation lists from a set of 3200 full-text papers originating from a narrow domain. In parallel, we recovered the corresponding MEDLINE records for analysis of the argumentative moves. Our argumentative model is founded on four classes: PURPOSE, METHO...
متن کاملSummarizing Drug Information in Medline Citations
Adverse drug events and drug-drug interactions are a major concern in patient care. Although databases exist to provide information about drugs, they are not always up-to-date and complete (particularly regarding pharmacogenetics). We propose a methodology based on automatic summarization to identify drug information in Medline citations and present results to the user in a convenient form. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the American Medical Informatics Association
سال: 2014
ISSN: 1067-5027,1527-974X
DOI: 10.1136/amiajnl-2012-001552